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S^nal p^._ , 


possriuj MODiincAtroNS or 
THE IllSSi: MODEL FOR PURE 
LANDSAT AGRICULTURAL DATA 

by 



nssuiiiption of cl ass conditionivl iodepeiKlcnCG of LANDSAT spectral ineasurements 
within the sanie patch (field). Theoretical firguiiients are given which show 
that any significant rofineinent of the model beyond Peiveson's proposal will 
not allow the reduction, essential to HISSE, of the pure data to patch summary 
statistics. A slight alteration of the new model Is shown to bo a reasonable 
approximation to the model which describes pure data elements from the same 
patch as jointly qaussian with a covariance function which exhibits exponential 
decay with re.spect to spatial separation. 
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T • IM Basic HISS E Model ajKl rts_ Modifications . 

The original mathematical assumptions underlying HISSE are fully described 
In f71. Briefly, they are: 

a) The sampled pure pixels are organized into p patches (fields) 
and corresponding to each patch j, there Is a set of spectral 


data measurements Xj = 


'jN 


), where Is the 


(perhaps multitemporal) vector of spectral data from the pixel 
In the patch. For each patch j, there Is also an unknown 

class designation 0- e {l,***,m}, where m Is known, 

b) T^'j {(X.,0 .)}^_i are treated as Independent random variables. 

The e* have a common unknown discrete distribution 

^ ^ m 

Prob [8j = ^-.1 = > 0, where = 1* 


c) Given that = iy X^i , X^j^ are Independently normally 
distributed with unknown mean and unknown variance-covariance 


matrix 

A proposed modification due to A. Felveson [3;i, Introduces one additional 
matrix parameter for each class. Assumption (c) is changed to 

c') Given that = i, Xjj^ ~ ^jk’ where E(ej) 

= E(d^l^) = 0, var (e^) = var id.^) = and the 

e.'s and d..'s are independent normal random variables. Thus 
0 J ^ 

the elements X^.^ , •••, X^.j^ of X^ are jointly normal with 

0 

marginal distributions X^.j^ constant 

within-patch covariance cov(X.^, X..) = So, for k 1 . 

J K J 1 
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Notice that the original assutiiplion (c) is a limiting case of (C) obtained 
by allowing )‘^ -* 0. 

For reasons discussed later, we will alter (c‘) tc 


(c") The constant within- patch covai'iance for elements of the 
jth patch is cov (X.|.,X^^. ) ® 

J 

The effect of (c") is that data elements from largo patches are considered more 
weakly correlated than those from small patches. Assumption (c') is perhaps 
more appropriate if the correlation between pixels of the same patch is really 
independent of their spatial separation, while (c") is better if the correlation 
falls off rapidly with spatial separation, on account of the preponderance of 
spatially distant pairs in larger patches. Calculations are presented in Section 
4 to suggest that {c") is a reasonable approximation to the average covariance 
between pairs when the correlation decreases exponentially with spatial separation. 
In Section 3 theoretical arguments are given which severely restrict the covariance 
models for which the patch mean vector and scatter matrix are sufficient statistics 
without, however, eliminating (c') and (c"). This is an important consideration, 
sincG procedures like ItlSSE are feasible only if the spectral information in patcho 
can be summarized in a small number of statistics. 


t?r. jX? covariance Mod el s. 

The likelihood function and iterative procedure for the current version of 
HISSF. are given in t7l and will not be repeated here. For covariance models 
(c') and (c"). The likelihood functions is 


I - loo s = }■ log f(xj 

jsl ^ ^ O'-l ^ 
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whore the model («') 


fu(Xj) " K'jl *’ W, + NjXjJ 


oM- } qj(Xj)l 


Qp(X^) ^ '• ^ ) • 


while fo»' model (c") 


f^(Xj) » U'jl ‘' |||>J, + Sjl espl- 1 Ot(Sj)l 




In both these expi'esslonr, m, ond S> are, respectively the p? 

vl J 

scatter 

Nj 

“’j " 'Nj 


patch mean and 


Thus for both of these covar1o\nce models the patch mean and scvitter are dolntly 
sufficient. 

The unconstrained likelihood equations for mode! (c") have the form 


(Ul) 


( 1 .?) 


1 p »sf((X^) 

"i " p y, - ■ 

P f,(S,) p 

■ f(sj-) T{Yj) 
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(1.3) 



(l.'l) 


I) F.(.X,) 


P 




J“1 


wIiMH' I hi' now parvnnu'tor S.'jx is dofinod as + \|i^, 

Ti\o expressions on the right of oquatir-ns (1.1) *• (1.4) are appealing in 
that Ihea’ are averages of quantities whose expeetations, given 0^ « t, are 
the parameters on the left. In addition, the suceessive substitu.Xions scheme 

I 

suggested Iv equations (1.1) • (1 .4) isa si ight variation of the general ized 
1>M procedure of nenipster, laird-, and Rubin f?!. For covariance model (o’), 
the likelihood equations do not suggest a natural iterative procedure and it 


appears that the y^i^rralised 1%H proeeduee has no simple formulation. 

To he consistent with the original interpretation of the parameter X,. 
as a variance*-covar1ance matrix, it is necessai\v to maximixo tlie likelihood 
subject to the additional inequality constraint t Since a solution of 
equations (1,1) - (1.4) need not satisfy this constraint, maximizing the likeli- 
hood subject to requires a much more complicated numerical procedure. 

The condition is equivalent to a set of scalar inequality and nonlirear 

equality constraints, and numerical procedures for such problems are generally 
very slow to converge. The unconstrained maximum likelihood procedure 1s 
appropriate if .v> in (r") vs'o merely assume that eov Is the same for alt 

i and k, without introdueing random variables e^ and d^^,. 


and scatter are sufficient, 

Let X a a matrix whose columns are jointly normally 




iliiFiiftiflwi II i i iiiiiiii< Mii i iiit i i wi i wri i wWW i M i iaiilir ^ iiiliili1hrtirfpiiWn^^ 




5 


r 


( 




distributed n~ vectors. We are interested in characterizing those families 
* 

of distributions of X for which the statistic (m-S) is sufficient, 
where ni s + ... + and S « x^x| + ••. + Xj^xJ. We begin by recalling 

the following definitions [4, p„ 321. 


Pefinition ; Let G be a group of hoineoniorphisms on , A function T 
defined on ^ is invari ant under G if T(gx) = T(x) for all x g g G. 

T is a ma ximal invariant of G if T is invariant and T(x) = T(y) implies 
that there is a g g G such that* y = gx. A measure X is invariant under G 
if Xg = X for all g g G, where Xg(E) = X(g(E)). 


n N 

Lemma 1 : Let elements of be represented as x = (x^h..|Xj^^) and 'let 

T 

e = » • * • )^xN* ^ ^ orthogonal matrix u satisfying 

ue = e, let 9 ^J(x) - xu. Then T(x) = (m,S) = (xe,xx ) is a maximal invariant 
of the group 6 = Cg^^} . 


T T 

Proo f; T(9y^) “ (^ue, xu(xu) ) = (xe, xx ) = T(x), Thus T is invariant. 

Suppose that T(x) = T(y) so that xe = ye and xx"'^ = yy"^. If x^^*^ and y^^^ 
denote the i rows of x and y then and x^^^e = y^^^e 

for all i and j. This implies that corresponding rows of x and y have the 

T 

same Euclidean norm and form the same angle with the vector e . In addition, the 
rows of X describe the same set of angles in as do the corresponding rows 
of y. Thus, by carrying out parallel Gram-Schmidt procedures on 
{e^, x^^^, •**, x^^^h and {e^, y^^^, •••, y^'^h, it is easy to construct an 
orthogonal matrix u such that e^u = e^ and x^^^u = y^^'^ for each i; that 
is, such that y = g^x. Therefore T is a maximal invariant. 
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Exampi e ; Any linear function T defined on is a maximal invariant under 
the group of translations by elements of the kernel of T. In fact, most of 
the results in r,61 characterizing linear sufficient statistics depend only on 
this aspect of linearity. 

If T is a maximal invariant then any invariant function on is a 
function of T(x). Moreover, a function h o T on is a maximal Invariant 
if and only if h Is one to one on the range of T, In the theorems which 
follow we shall require that T be a continuous open mapping, in addition 
to being a maximal invariant. The following lemma shows that to some extent T 
may be choocn for convenience, with affecting the property of openness. 

Lemma 2 : Let V be an open subset of let G be a group of homeomorphisms 

from V to V and let T-j and T 2 be continuous maximal invariants of G 
defined on V with values in If is an open mapping then so is T^. 

Proof ; Since Tg and are maximal invariants, there is a one to one 

function h;T^ ( V) -> T 2 (V) such that T 2 = hT^ . Since h’"' = on I 2 (V), 

is continuous and T-j is open, h is continuous. By the Brouwer invariance 
of domain theorem fS, p. 3] h is an open mapping. Therefore, T 2 is also open. 

The orem 1 : Let V. be an open subset of jf^ , let 7f'j be a homogeneous collection 
of finite Borel measures on , and let \ be a fixed element of?//. Suppose 

that X(V ) = 0 and A(U) > 0 for each nonempty. open subset U of V. Let 

G be a group of homeomorphisms from V to V such that A(gB) = 0 whenever 

X(B) = 0 and g c G. Suppose that f^ is a continuous representative of ^ 
for each p and that T:V is a continuous open maximal invariant of 
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6» Thpn T Is IS sufficioiU fo»' I'l if and only i> each Is 

invariant under 6. 

Prooft Suppose tliat T is sufficient. Then for each \\ t *rx tl^cre exists a 
Rore1 measuroable function sucIj that is a version of cip/dX, [11. 

let \i K g i G be fixed. The set 

U « fx. Vlf^|{x) f^,{gx)) 

is an open subset of 11 u g”^(tl). whoi‘e 

11 - {x.Vlf,{x) X k^^(t(x))K 

Since A(B) s q, xCg^^CR)) « Q and \(U) « 0. Therefore, U is empty and it fQUovs,'s 
that is invariant. Conversely, if each f^^ is invariant, then for each 
II t there exists a function h^^ such that continuous 

and T is open, h^^ is continuous on T(V). Therefore, by il, Corollary 5,11 
T is sufficient. 

Corol lary 1 ,1 ; Given the hypotheses of Theorem 1, if \ is invadant then T is 
sufficient if and only if each p • is invariant. 

Proof: In genera!, a density with respect to \ of pg is f^^ » (fjjOg)h, 

where h is a version of dXg/dV. If \ is invariant, then wo can take h « 1 

to obtain f^^^^ as a unique continuous density of pg, for each p,g. By 

Theorem 1, T is sufficient if and only if f « f , which is equivalent to 

i ' J I- 

pg - P. 

( 7 
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Suppose that pg . •7tt foe 00 eh n <7K I g t 6 end that 0 Is an 
e-dinions1onal parainetor1?at1on of7}{ ^ ive.» a one to one function onto 

li * P(7/n « /R"'. Then there Is a !Miiioi’ioeph1si« g i’ g^ from G onto a group U 

of transformations on defined by gCOg) * 0(ir^(n^)g). The following corollary 
Is clear. 


Corollary l.H. Given the hypothesis of Theorem 1, If A Is Invariant then T 
is sufficient jff G 1s the trivial group consisting only of the Identity map.ping 
on iTi 


To apply these results to the characterization problem at hand* let 
X « (X>| 1 • • ' IXj^) ho a random n ^ N matrix having one of a r^lvon family of 


normal distributions and let X 


v,(l) 


denote the 1th row of X, He think of 


X^ , **'» Xj^ as being the observed random vector, but at various times wish to 
consider the parameiers 


u,- ^ E(X^) 

4 cov( »X j ) 


for the open sot V of Theorem 1» wo take the set of regular points of- 
T 

T{x) (xe, XX ); that ls» the set of points x at which T’{.\) is surjective 

T 

T'(x) Is surjectlvo if the matrix has rank n + 1, which 1s almost certainly 

true for any of the probabilities under consideration as soon as N ^ n + 1. 

Clearly any of the mappings g^, of lemma 1 1s a homeomorphism from V onto itself 
and T 1s a continuous open mapping on will be the given set of nN-variale 
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noiwl probahllUv iiu'asuvo'i. llu' Invarlaivt ruMsun^ \ of Corollary 1%S will 
be tluvt tiiven bv r| i) if i j, already 

a member may be avkted without affeetimi the suffieieney of T 

According to Corollary 1*»\ and lema 1, T is sufficient for 7ft If and only if 

(2.1) iin)uo„n! 

and 

(2,?) U n 


OF PoOf? 


for q 11 and u v U « x N ort)iouonv\l mvitrices u suci» tiuit ue eK 

Now^ (2»1) bolds if and only if each for some scalar X^, whicn 

is eguivalent to u^ « ••’ • vi|^» fn (2*2) u way be rerlaced by the larger set 

U* ’ (N X M ortliogonal matrices such tluafe ue ' .fol. Let C | ee*'^ and 

Q « 1 •* l\ Then U' is the set of all orthogonal mat, ri cos Vs’hich cowto with 

f i il 

P, and (?»?} statox that each fr'"'' coamUes with each ut U‘* Let w 
be an orthegonal matrix such that 



1 

%< 

°1>«(N-11 

A * -i, V 1ft fti,' as 

(N-l)M 

t -jtt -r* -rir. -■=. 


Then If is the set of all orthogonal matrices u such that 
with wnw*^ and (2.2) holds iff cowvmtos with wuw*^ 

Llementary valvclaHons show that wuw^ must bo of the form 


T 

wuw commutes 
for each u t if . 


T 


wuw « 


1 

0 




whei'o V is (N'*1)''(N«1) orthogonal, and that for some scalars X. 
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If follows that (2.2) Is true vff each is a linear combination of 

P and Q. Therefore, (2.2) holds if and only if each has constant 

diagonal elements and constant off diagonal elements, which may depend on i 
and j. Thus, there are matrices A = (a^.^) and B - b(y) such that 


That is, 


and 


COV(X^.;^,X.j,) = 



var(X^) = A 
cov(X|^,X^) = B 


if k = A 
if k < A 


for all k 
if k ^ 51 . 


Consequently, A and B are symmetric and we have established 

Theorem 2 : Let X-| , •'•, Xj^ be jointly normally distributed n-vectors whose 

joint distribution is a number of a family"?^. Then the mean and scatter matrix 
of the X^'s are sufficient for 7t( if and only if for each member of 
(a) the X.'s are identically distributed, and (b) cov(Xj,X.) is independent 
of i and j. 


4 . Co nclusion : 

As we mentioned in Section 1 if one thinks of a patch as an approximation 
to a field then it is difficult to understand how the within-patch covarian'ce of 


n 
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spnctral measureinents from a given patch could be constant but dependent on the 
patch size as in (c"), According to the results of Section 3, there is- itb ftiore 
sophisticated covariance modal whose parameters can be estimated with optimum 
efficiency using only the patch means and scatters; however, there may be 
more realistic covariance models which are well approximated by (c') or (c"). 

For example, suppose that a patch is rectangular in shape with multidimensional 
spectral information = 1^-r; j = l^.^c) where i and j denote the 

spatial line and column number of the pixel producing X... Suppose further that 
the correlation of two observations X^j and decays exponentially with 

their spati“l separation; that is, 




where is their common variance matrix and A and B are symmetric commuting 
matrices of spectral radius less than 1, Let t be the average covariance over 
all pairs of distinct pixels. Then a simple calculation shows that for large 
r and s (large patef'. size) rsX is nearly A(I-A)'’^B(I“B)“^n'\ so 
that I is nearly inversely proportional to the patch size, as is required by (c"). 

. T 

If A and B are positive seinidefinite, so that - z X. . is always positively 

' J 

T 

correlated with z Xj^^ for any z, then the expression just given is an upper bound 
for the average within-patch covariance for any patch size. Therefore, the effect 
of approximating the exponentiiil covariance model with the constant covariance 
model (c") may be predictable, and not serious. 
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